Skip to content

HF upload guide update#65

Merged
egrace479 merged 8 commits intomainfrom
hf-upload
Apr 27, 2026
Merged

HF upload guide update#65
egrace479 merged 8 commits intomainfrom
hf-upload

Conversation

@egrace479
Copy link
Copy Markdown
Member

@egrace479 egrace479 commented Apr 25, 2026

Revises the Hugging Face dataset upload guide to better reflect the preferred methods.

Still ToDo:

  • refine text
  • cut extra old content
  • run linter (most errors should be resolved by finishing the first two items)

Closes #44

@egrace479 egrace479 requested a review from hlapp April 25, 2026 17:27
@egrace479 egrace479 added enhancement New feature or request structure Refactoring or architecture, general code organization labels Apr 25, 2026
Copy link
Copy Markdown
Member

@hlapp hlapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@egrace479 looks pretty good. See two edits for clarification (though I'm not 100% sure these are right).

The PR is marked as draft and it looks like there may still be some placeholders (in the form of ...s), so making this a comment. If instead you meant it to be ready for merging, remove the draft status and re-assign me.

Comment thread docs/wiki-guide/The-Hugging-Face-Dataset-Upload-Guide.md Outdated
Comment thread docs/wiki-guide/The-Hugging-Face-Dataset-Upload-Guide.md Outdated
@hlapp
Copy link
Copy Markdown
Member

hlapp commented Apr 26, 2026

I'm also wondering how much of the git lfs part we should leave in here. It seems the cases where this is best is fairly limited, so maybe the test is, can we clearly articulate the nature of those cases (e.g., want full version control access?), and if we're having trouble with that, cut that section down to merely mentioning it exists and linking to the docs.

@egrace479
Copy link
Copy Markdown
Member Author

I'm also wondering how much of the git lfs part we should leave in here. It seems the cases where this is best is fairly limited, so maybe the test is, can we clearly articulate the nature of those cases (e.g., want full version control access?), and if we're having trouble with that, cut that section down to merely mentioning it exists and linking to the docs.

I opened the draft PR so you could see what I had so far (since you had this issue in mind). I hadn't removed all the old content yet, instead moving most of it further down the page in case I wanted to pull more for the main content. This is also why (as noted in your first comment) there are placeholders.

I will re-read tomorrow, but am happy for any feedback. My current plan:

I think lines 89-114 (git lfs and gitattributes content) should probably be deleted, while lines 116-126 should fall under an "other topics of note" style heading or be worked in as notes at relevant locations further up. HF is quite picky with merge conflicts, so I want to make sure to include a warning about that.

@hlapp
Copy link
Copy Markdown
Member

hlapp commented Apr 27, 2026

This sounds good to me. The other remaining point I noticed when reviewing this is that it only talks about dataset repos on HF. But the guidance (or at least most of it?) applies just as much to model repos.

Presumably we don't want to have near-duplicates of this for dataset and model repo guidance. Would the idea be to create a separate model repo guidance that by and large refers to this page asking the reader to replace the concept of "dataset" with "model"? Or would it be better to have a single page that in the (hopefully very few) places where it matters distinguishes between dataset repo and model repo type? (One pending project candidate that needs this guidance in fact needs it for a model repo.)

@egrace479
Copy link
Copy Markdown
Member Author

This sounds good to me. The other remaining point I noticed when reviewing this is that it only talks about dataset repos on HF. But the guidance (or at least most of it?) applies just as much to model repos.

Presumably we don't want to have near-duplicates of this for dataset and model repo guidance. Would the idea be to create a separate model repo guidance that by and large refers to this page asking the reader to replace the concept of "dataset" with "model"? Or would it be better to have a single page that in the (hopefully very few) places where it matters distinguishes between dataset repo and model repo type? (One pending project candidate that needs this guidance in fact needs it for a model repo.)

It could probably be refactored to a general HF upload guide, especially since I already referenced a model vs dataset vs space distinction. I think models are generally more standardized, so it should be simple enough to mostly point to the docs. The key points to note there are about different checkpoints getting their own repositories and then generating a collection.

@hlapp
Copy link
Copy Markdown
Member

hlapp commented Apr 27, 2026

The checkpoints sound a little less straightforward, so maybe best to defer refactoring to a subsequent PR and get the CLI guidance online now, even if in phrasing only for datasets?

@egrace479 egrace479 marked this pull request as ready for review April 27, 2026 21:47
@egrace479 egrace479 requested a review from hlapp April 27, 2026 21:50
Copy link
Copy Markdown
Member

@hlapp hlapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@egrace479 egrace479 merged commit 8a016e1 into main Apr 27, 2026
1 check passed
@egrace479 egrace479 deleted the hf-upload branch April 27, 2026 23:24
egrace479 added a commit to Imageomics/Imageomics-guide that referenced this pull request Apr 28, 2026
Pull from Collab Guide [PR 65](Imageomics/Collaborative-distributed-science-guide#65)

* Initial re-organization of page, includes integrity checks and more links to docs
WIP refining

* Add in links, fix typos

* Additional notes in sections, refs out

* Add updated screenshot with arrow
match UI change to 'contribute'

* Remove extra outdated content, update URLs for new versions, run linting

* Clarify UI upload cases

Co-authored-by: Hilmar Lapp <hlapp@drycafe.net>

* Clarify CLI upload use case

Co-authored-by: Hilmar Lapp <hlapp@drycafe.net>

* Add pointer for considerations with large folder uploads

---------

Co-authored-by: Hilmar Lapp <hlapp@drycafe.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request structure Refactoring or architecture, general code organization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revise Hugging Face Upload Guide

2 participants